8 research outputs found
Fourier Based Fast Multipole Method for the Helmholtz Equation
The fast multipole method (FMM) has had great success in reducing the
computational complexity of solving the boundary integral form of the Helmholtz
equation. We present a formulation of the Helmholtz FMM that uses Fourier basis
functions rather than spherical harmonics. By modifying the transfer function
in the precomputation stage of the FMM, time-critical stages of the algorithm
are accelerated by causing the interpolation operators to become
straightforward applications of fast Fourier transforms, retaining the
diagonality of the transfer function, and providing a simplified error
analysis. Using Fourier analysis, constructive algorithms are derived to a
priori determine an integration quadrature for a given error tolerance. Sharp
error bounds are derived and verified numerically. Various optimizations are
considered to reduce the number of quadrature points and reduce the cost of
computing the transfer function.Comment: 24 pages, 13 figure
Tensor Contractions with Extended BLAS Kernels on CPU and GPU
Tensor contractions constitute a key computational ingredient of numerical multi-linear algebra. However, as the order and dimension of tensors grow, the time and space complexities of tensor-based computations grow quickly. In this paper, we propose and evaluate new BLAS-like primitives that are capable of performing a wide range of tensor contractions on CPU and GPU efficiently. We begin by focusing on single-index contractions involving all the possible configurations of second-order and third-order tensors. Then, we discuss extensions to more general cases. Existing approaches for tensor contractions spend large amounts of time restructuring the data which typically involves explicit copy and transpose operations. In this work, we summarize existing approaches and present library-based approaches that avoid memory movement. Through systematic benchmarking, we demonstrate that our approach can achieve 10x speedup on a K40c GPU and 2x speedup on dual-socket Haswell-EP CPUs, using MKL and CUBLAS respectively, for small and moderate tensor sizes. This is relevant in many machine learning applications such as deep learning, where tensor sizes tend to be small, but require numerous tensor contraction operations to be performed successively. Concretely, we implement a Tucker decomposition and show that using our kernels yields atleast an order of magnitude speedup as compared to state-of-the-art libraries
Recommended from our members
Tensor Contractions with Extended BLAS Kernels on CPU and GPU
Tensor contractions constitute a key computational ingredient of numerical
multi-linear algebra. However, as the order and dimension of tensors grow, the
time and space complexities of tensor-based computations grow quickly. Existing
approaches for tensor contractions typically involves explicit copy and
transpose operations. In this paper, we propose and evaluate a new BLAS-like
primitive STRIDEDBATCHEDGEMM that is capable of performing a wide range of
tensor contractions on CPU and GPU efficiently. Through systematic
benchmarking, we demonstrate the advantages of our approach over conventional
approaches. Concretely, we implement the Tucker decomposition and show that
using our kernels yields 100x speedup as compared to the implementation using
existing state-of-the-art libraries
Industrialisierung und Strukturkrise zur oekonomischen Transformation Spaniens in der Liberalisierungsperiode
UuStB Koeln(38)-8Y8589 / FIZ - Fachinformationszzentrum Karlsruhe / TIB - Technische InformationsbibliothekSIGLEDEGerman